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Abstract 

The magnification behaviour of a generalized family of self-organizing feature maps, the Winner Relaxing and Winner Enhanc- 
ing Kohonen algorithms is analyzed by the magnification law in the one-dimensional case, which can be obtained analytically. The 
Winner-Enhancing case allows to acheive a magnification exponent of one and therefore provides optimal mapping in the sense of 
information theory. A numerical verification of the magnification law is included, and the ordering behaviour is analyzed. Compared 
to the original Self-Organizing Map and some other approaches, the generalized Winner Enforcing Algorithm requires minimal ex- 
tra computations per learning step and is conveniently easy to implement. 

Keywords: self-organizing maps, Kohonen algorithm, mutual information, magnification exponent 



The Self-Organizing Map defined by Kohonen in 1982 1 1 1 
startet a class of highly successful neural network models, since 
it has shown both relevance for modeling of biological net- 
works and engineering of artificial neurocomputing including 
data analysis and self-organized dimension reduction of com- 
plex structured and high-dimensional input spaces. 

The qualitative biological inspiration comes from cortical re- 
ceptor fields as the receptor fields of the retina, and the receptor 
field of the skin. For both, neighboured sensory input always will 
be represented by cortical activity that is also of neighboured 
location in the neural tissue. Such topology-preserving neural 
maps are known quite a long, and apart from error-tolerant com- 
putation, two striking properties are known: First, there is high 
plasticity, e.g. when a finger is cut off, the neurons now lacking 
sensory information begin to specialize themselves for the ad- 
jacent fingers. Second, as the complete structure (of all synap- 
tic weights even of parts of the brain) is far too complex to be 
coded genetically, the detailed structure has to emerge from a 
self-organizing process, obviously stochastically driven by the 
sensory information. 

While the quantitative structure of biological maps can suc- 
cessfully be modeled even with the simple Kohonen map, there 
are many variants and modifications leading to qualitatively sim- 
ilar self-organizing topology-preserving maps. The approaches 
to define and discuss these are as different as the resulting algo- 
rithms, but can be categorized roughly as follows, (a) Derivation 
from first principles, as energy or cost functions, mutual infor- 



mation, averaged representation error, distortion measures and 
every combination of these, (b) Argumentation from structure, 
from a realistic biological model to discussion of optimal tech- 
nical implementations, (c) Extraction of measurable quantitative 
properties, as magnification laws, properties of fluctuation, or- 
dering, adaptation, error-tolerance, and spatial frequency of sin- 
gularities upon dimension reduction, as in the retina. Finally (d) 
Restriction to the simplest possible models, which is merely a 
physicists point of view. 

As there are different aims of using feature maps, these may 
naturally lead to different viewpoints. For some technical appli- 
cations, it may be convenient to use any kind of vector quanti- 
zation, e. g. the Kohonen model wihout any neighborhood inter- 
actions, and to apply some sorting algorithm to set up topolog- 
ical order afterwards. In the brain, hovever, it is assumed that 
the topological structure is set up by self-organization, therefore 
here we focus on self-organizing maps only. 

Compared to the Elastic Net Algorithm of Durbin and Will- 
shaw 1 2 1 and the Linsker Algorithm Q which are performing 
gradient descent in a certain energy landscape, the Kohonen al- 
gorithm seems to have no energy function. Although the learn- 
ing process can be described in terms of a Fokker-Planck equa- 
tion @, the expectation value of the learning step is a noncon- 
servative force 1 5 1 driving the process so that it has no associated 
energy function. Furthermore, the relationships of the Kohonen 
model to both alternative models and general principles are still 
an open field |6 ] 
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In this paper we analyze an algorithm which is a gradient 
system (and therefore has an extremal principle) in terms of the 
principle of maximizing mutual information. As one has to as- 
sume the existence of a functional that is optimized by evolution, 
the question of extremal principles is central in theoretical brain 
research. Mutual information is assumed to play an essential role 
in any energy landscape that may describe the evolution of the 
brain. To measure optimality in sense of information theory we 
consider the magnification law for onedimensional maps. The 
magnification factor is defined as the number of synaptic weight 
vectors (respectively neurons) per unit volume of input space. 
Maps of maximal mutual information show a power law with 
exponent 1, but the algorithm given by Linsker |3| requires a 
complicated learning rule, whereas an exponent corresponds 
to no adaptation to the stimuli at all. The exponent 1 is equiva- 
lent to that all neurons have the same firing probability. 

The leading question is if there are models that are suitable 
to describe biological maps and show a sufficient magnification 
behavior. The exponents for the Kohonen [ 7 1 and the Linsker al- 



gorithm are known quite a long. Also the Elastic Net also shows 
a universal (i. e. depending only on the local stimulus density) 
magnification law which however is not a power law 1 8 1, and for 
serial presentation does not allow for both stability and infomax 
mapping. 

With the generalized Winner Relaxing Kohonen algorithm 
howver, by inverting the 'relaxing' term an exponent 1 can be 
acheived, with a minimal computational extension of the algo- 
rithm. 

1 The Kohonen Self Organizing Feature Map 

The Kohonen algorithm for Self Organizing Feature Maps is de- 
fined as follows: Every stimulus v of an euclidian input space V 
is mapped to the neuron with the position s in the neural layer 
R which has the weight vector wg with the minimal distance in 
input space, corresponding to the highest neural activity, which 
is called the 'center of excitation' or 'winner' (Fig. 1). 




Fig. 1: A Self-Organizing Mapping <f> from an input space V to a neural layer R. Every stimulus v gets assigned to a center of 
excitation s. Weight vectors wg change according to a learning rule that defines each map algorithm. 



In the Kohonen model the learning rule for each synaptic 
weight vector w? is given by 

5wp = r\ ■ gpg ■ (v — wp) (1) 

with gpg as a gaussian function of euclidian distance \r — s| in the 
neural layer. The function gpg describes the topology in the neu- 



ral layer. The parameter 77 determines the speed of learning and 
can be adjusted during the learning process. Topology preser- 
vation is enforced by the common update of all weight vectors 
whose neuron 7^* is adjacent to the center of excitation s. 
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2 Salesman Travelling to Town and Countryside: 
Adaptation in discrete and continuous input spaces 

It is illustrative to consider a lowdimensional example how the 
Self-Organizing Map adapts to the structure of the stimuli data, 
which is defined only by the input probability density. Now the 
neural layer is chosen to be only one-dimensional, and the first 
and last neuron are connected by periodic boundary conditions. 

If the probability density is given by a finite sum of delta- 
peaks, and - provided a siutable parameter decay - the neural 
weights of the Self-Organizing Map will converge to these stim- 
uli, as shown in Fig. 2 and it will approximately find the shortest 
route visiting all stimuli [9 1. 
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Fig. 2: Solving the Travelling Salesman Problem with the Ko- 
honen Map. 

If we call the stimuli 'cities', we recognize this as the fa- 
mous Travelling Salesman problem, which is believed to be a 
NP-complete problem - no algorithm can be found that com- 
putes the optimal solution within a computation time that scales 
polynomial with the number of cities: Instead computation time 
scales exponentially. 

Other algorithms have been given that also give near-optimal 
solutions: the thermodynamic-motivated Simulated Annealing 
of Kirkpatrick 1 10 1, the neural approach by Hopfield and Tank 
II II . and the Elastic Net of Durbin and Willshaw |2|. The latter 
two have been found to be limiting cases of a unified approach 

G2. 

If we now replace the input probability density by a contin- 
uous one that may now be constant within the country, and zero 
outside, as shown as background in Fig. 3 the weight vectors will 
try to cover the countryside as good as possible, being in conflict 
between preserving topology as far as possible, and minimiz- 
ing the reconstruction error. The necessary dimension reduction 
takes place by a snake-like folding of the weights to locally step 
up and down in the excess dimensions. For the input coming 
from the retina, this dimension reduction task is heavier (from 5 
to 2 dimensions) |5 1. 
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Fig. 3: For continuous input spaces of higher dimension, the 
Self-Organizing map approximates by maeandring structures. 

3 The Magnification Factor 

Depending on the (now assumed to be continuos) input prob- 
ability density P(v) of the stimuli, an adaptive map algorithm 
can spend more neurons to represent areas of higher probability 
density, according to a higher resolution. 

The magnification factor is defined as the density of neu- 
rons r (i. e. the density of synaptic weight vectors wp) per 
unit volume of input space, and therefore is given by the in- 
verse Jacobian of the mapping from input space to neuron layer: 
M = |J| _1 = \det(dw/dr)\^ 1 . (In the following we consider 
the onedimensional case of noninverting mappings, where J is 
positive.) The magnification factor is a property of the networks' 
response to a given probability density of stimuli P(v). To eval- 
uate M in higher dimensions, one in general has to compute the 
equilibrium state of the whole network and needs therefore the 
complete global knowledge on P(v). 

For one-dimensional mappings (and possibly for special ge- 
ometric cases in higher dimensions) the magnification factor can 
follow an universal magnification law, that is, M(w(r)) is a 
function only of the local probability density P and independent 
of both the location r in the neural layer and the location w(r) 
in input space. 

An optimal map from the view of information theory would 
reproduce the input probability exactly (M ~ P(v) p with p = 
1), according to a power law with exponent 1. This is equivalent 
to the condition that all neurons in the layer are firing with same 
probability. An exponent p — 0, on the other hand, corresponds 
to a uniform distribution of weight vectors, which means there is 
no adaptation to the stimuli at all. So the magnification exponent 
is a direct indicator, how far a Self Organizing Map algorithm is 
away from the optimum predicted by information theory. 

As the brain is assumed to be optimized by evolution for in- 
formation procession, one would postulate that maximal mutual 
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information is a sound principle governing the setup of neural 
structures. Such an algorithm of maximal mutual information 
has been defined by Linsker 1 3 1 using the gradient descend in 
mutual information. It requires computationally costly integra- 
tions, and has no local or other learning rule that allows for bio- 
logical motivation. 

However, both biological network structures and technical 
applications are (due to realization constraints) not necessarily 
capable of reaching this optimum, being especially for the brain 
under discussion 1 13 1. Even if one had quantitative experimental 
measurements of the magnification behaviour, the question from 
what self-organizing dynamics the neural structure emerged re- 
mains. So overall it is desirable to formulate other learning rules 
that minimize mutual information in a simpler way. 

To start with the simplest algorithm: For the classical Koho- 
nen algorithm the magnification law (for onedimensional map- 
pings) is given by a power law M(w(r)) oc P(w(r)) p with ex- 
ponent p — 2/3 [7 1. Although for a discrete neural layer and es- 
pecially for neighborhood kernels with different shape and range 
there are corrections to the mag nification law l9l fl4l IT51 . we 
consider the limit of a continuous neural layer, and restrict our 
analysis to the onedimensional case. 

4 The Generalized Winner Relaxing Kohonen 
Algorithm 

We now consider an energy function that was at first proposed 
in 1 9 1 for the classical Kohonen Algorithm, and is given by the 
mean squared reconstruction error of the resulting map. For con- 
tinuous input spaces, or when the borders of the voronoi tesse- 
lation shift across a localized stimulus, there have to be correc- 
tions. 

Now one can, if an energy function is desired, turn the ar- 
gumentation around: If the Self-Organizing Map has no energy 
function, and if the sqared reconstraction error is an approximate 
one, start from this energy formula and try to derive what learn- 
ing rule will result. 

Kohonen has, utilizing some approximations, shown in (6) 
for the one- or two-dimensional case that a gradient descent in 
the mean squared reconstruction error results in a slightly dif- 
ferent learning rule only for the winning neuron, due to that also 
the borders of the voronoi tesselation are shifting if one evaluates 
the gradient with respect to a weight vector. 

As the additional learning term implies an additional elastic 
relaxation for the winning neuron, it is straightforward to call 
it 'Winner Relaxing' (WR) Kohonen algorithm. As the relaxing 
term acts only in one direction, the winner is relaxed to its neigh- 
bours, but the neighbours stay unattracted, it can not strictly be 
interpreted as an elastic force or physical interaction. 

It is straightforward to generalize the Winner Relaxing al- 
gorithm by introducing the free parameter A to the generalized 



Winner Relaxing Kohonen map 1 16 1 
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where s is the center of excitation for incoming stimulus v, and 
g~L. is a Gaussian function of distance in the neural layer with 
characteristic length 7. The original Algorithm proposed by Ko- 
honen |6| is obtained for A = +1/2, whereas the classical Self 
Organizing Map Algorithm is obtained for A = 0. (Note that 
only for A = +1/2 the algorithm is associated with the potential 
function!) 

5 Magnification Exponent of the Generalized 
Winner-Relaxing Kohonen Algorithm 

The necessary condition for the final state of the algorithm is 
that for all neurons the expectation value of the learning step 
vanishes. This gives a Chapman-Kolmogorov-Equation for the 
stochastic learning process of serial presentation. This can be 
used also to derive the Magnification Law of the the Winner- 
Relaxing Kohonen algorithm [16|: Insertion of the update rule 
into the stationarity condition and integration yields for (P := 
P(w(r))) the differential equation 



2/T d(PJ) PJdJ , PJdJ. 



(3) 



For 7 7^ 0, P 5^ 0, dP/dr 7^ and making the ansatz 
J(r) = J(P{r)) of an universal local magnification law (that 
may be expected for the one-dimensional case) we obtain the 
differential equation 



dJ 
dP 



J 



3 + XP 

with its solution (provided that A 7^ —3) 1 16 1: 



M 



1 _L_ 

- - P(v) 3+A . 



(4) 



(5) 



For the Winner- Relaxing Kohonen Algorithm (A = 1/2) the 
magnification factor follows an exact power law with magnifica- 
tion exponent p = 4/7, which is smaller than (p = 2/3) for the 
classical Self Organizing Feature Map. Although the Winner- 
Relaxing Kohonen Algorithm is 'somewhat faster' 1 6 1 in the ini- 
tial ordering process, the resulting invariant mapping is slightly 
less optimal in terms of information theory. 

From this result one would try to invert the Relaxing Effect 
by choice of negative values for A. The choice of A = — 1 would 
lead to the magnification exponent one, if the algorithm is sta- 
ble for this parameter choice. This is tested by our numerical 
experiment described below. 
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6 Numerical Verification of the Magnification 
Law of the Winner Enforcing and Relaxing 
Algorithms 

We used the following numerical setup: The network should 
map the unit interval to a onedimensional neural chain of 100 
neurons. The learning rate was 0.1. The stimulus probability 
density was chosen exponentially as exp(— (3w) with j3 = 4. Af- 
ter an adaptation process of 5 • 10 7 (Elastic Net) further 10% of 
learning steps were used to calculate average slope and its fluc- 
tuation of log J as a function of log P. (The first and last 10% 
of neurons were excluded to eliminate boundary effects). The 
results are shown for several parameters in Fig. 4. 




11.4 - 



Fig. 4: Numerical Verification of the WRK/WEK magnifica- 
tion law. The upper line is the thoretical prediction, followed by 
7 = 5.0 where the errorbars show that the theoretical prediction 
is met within the precision. The lower lines are for 7 =2.0, 1.0, 
0.5, and 0.1, showing that a neighborhood interaction of system 
size destroys adaptation. 

For the Winner Relaxing and Enforcing Algorithm Family 
we did simulations for different A and neighborhood length 7. 
For small 7, the neighborhood interaction becomes too weak. If 
the Gaussian neighborhood extends over some neurons 7 = 2, 
7 = 5, the exponent follows the predicted dependence of A given 
by 2/ (3 + A). For |A| > 1 we found the system to become in- 
stable, this is the case where the additional update term of the 
winner is larger than the sum over all other update terms in the 
whole network. The algorithm remains stable on both stability 
borders A = +1 and A = — 1, and the escape time diverges ap- 
proaching these boundaries from outside. A detailed discussion 
of the numerical study will be found in 1211 . As the relaxing 
effect of A > is inverted for A < 0, fluctuations are larger than 
in the Kohonen case. 

Apart from the fact that the exponent can be varied by a pri- 
ori parameter choice between 1/2 and 1, the simulations show 
that our Winner Enforcing Algorithm is in fact able to establish 
information-theoretically optimal self-organizing maps. 



7 Ordering behaviour of the WR and WE Ko- 
honen maps 

One might suspect that the inversion of a smoothing term might 
lead to larger fluctuations that could enlarge the time needed 
for convergence. Here the ordering behaviour is analyzed for 
a standard setup of 100 neurons with weights and stimuli uni- 
form in the unit interval, with a high learning rate 77 = 1 corre- 
sponding to parameters used in the initial ordering phase (at that 
high learning rate, out of the shown interval the ordering time 
increased by magnitudes of order). 

Fig. 5 shows the averaged number of learning steps per neu- 
ron that were needed until a monotonously increasing or decreas- 
ing list of weights was reached. In contrast to the initial exspec- 
tation, the minimal ordering time is found neither for A = 
(Self-Organizing Map) nor for the energy-function associated 
Winner-Relaxing (A = 1/2) case. Here we have the astonishing 
result that quicker ordering is not in complete contradiction to 
near-infomax mapping and can both be realized with the Winner- 
Enhancing Kohonen Algorithm. 
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Fig. 5: Ordering Behavior: Number of learning steps per neuron 
as a function of parameter A. 

8 Other recent approaches 

After our first study 1 16 1, Herrmann et.al. |17| introduced an- 
nother modification of the learning process, which was also ap- 
plied to the Neural Gas (which is equivalent to the Kohonen map 
without neighbour interaction) 1 18 1. This approach uses the cen- 
tral idea to make the learning rate 77 locally dependent on the in- 
put probability density by a power law with an exponent that is 
related to the desired magnification exponent, and also an expo- 
nent 1 can be obtained. As the input probability density should 
not be available to the neuronal map that self-organizes from the 
stimuli drawn from that distribution, it is estimated from the ac- 
tual local reconstruction mismatch (being an estimate for the size 
of the voronoi cell) and from the time elapsed since the last time 
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being the winner. Due to this estimating character, the learning 
rate has to be bounded in practical use. From the computational 
point of view, one has to keep track of the time difference be- 
tween the firing of two neurons, which introduces some memory 
term that needs extra storage, and the local learning rate has to 
be computed, which seems to be more costly than the Winner 
Enhancing Kohonen. 

Other modifications consider the selection of the winner to 
be probabilistic, leading to much more elegant statistical ap- 
proaches to potential functions (see Graepel et al. [ 19 1 and Hes- 
kes |20) ). 

The robustness of the Winner-Relaxing principle has been 
demonstrated by transferring it to the Neural Gas architecture, 
which now allows by this simple approach to obtain a magnifi- 
cation exponent 1 in arbitrary dimension \ 22\. 

9 Conclusions 

Feature maps are self-organizing structures in the brain and 
in computational neuroscience that can efficiently represent 
high-dimensional and complex input spaces. Retina, skin and 
other perceptual receptor areas are represented in a topology- 
preserving manner, i. e. if adjacent neighbours are firing, the 
active receptor cells also are adjacent. The detailed structure 
of these neural maps, including all synaptic connections, cannot 
be coded genetically, so it appears necessary to develop mod- 
els that set up their structure by a self-organizing progress. The 
Self-Organizing Map has become the most prominent model and 
been applied to many technical problems. Several other models, 
the Linsker Algorithm, the Elastic Net Algorithm and the Win- 
ner Relaxing Kohonen Algorithm have also been considered as 
models for feature maps and used in technical applications. Most 
of them follow from an extremal principle, given by information 
theory, physical motivations, or reconstruction error. But what 
extremal principles govern the feature maps in the brain? 

To answer this question finally would reqire more quanti- 
tative data about the magnification behaviour in experiments, 
which then would give a basis to judge how close nature comes 
to the optimum given by information theory. 

Magnification-adjustable models as the Winner-Relaxing 
and Winner-Enhancing Self-Organizing map can become a valu- 
able tool for comparison with experiments and further refine- 
ment of the theoretical understanding of the brain. 
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